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1 Multiple Camera Video System Which Displays Selected Images 

2 

3 Field of the Invention: 

4 The present invention relates to transmitting video information and more 

5 particularly to systems for streaming and displaying video images . 
6 

7 Background of the Invention: 

8 In many situations, a scene or object is captured by multiple cameras, each of 

9 which capture a scene or object from a different angle or perspective. For 

10 example, at an athletic event multiple cameras, each at a different location, 

1 1 capture the action on the playing field. While each of the cameras is viewing 

12 the same event, the image available from the different cameras is different due 

13 to the fact that each camera views the event from a different angle and 

14 location. Such images can not in general be seamed into a single panoramic 

15 image. 
16 

17 The technology for streaming video over the Internet is well developed. 

18 Streaming video over the internet, that is, transmitting a series of images 

19 requires a substantial amount of bandwidth. Transmitting multiple streams of 

20 images (e.g. images from multiple separate cameras) or transmitting a stream 

21 of panoramic images requires an exceptionally large amount of bandwidth. 
22 

23 A common practice in situations where an event such as a sporting event is 

24 captured with multiple cameras, is to utilize an editor or technician in a control 

25 room to select the best view at each instant. This single view is transmitted 

26 and presented to users that are observing the event on a single screen. There 

27 are also a number of known techniques for presenting multiple views on a 

28 single screen. In one known technique, multiple images are combined into a 

29 single combined image which is transmitted and presented to users as a single 

30 combined image. With another technique the streams from the different 

31 cameras remain distinct and multiple streams are transmitted to a user who 

32 then selects the desired stream for viewing. Each of the techniques which 
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1 stream multiple images require a relatively large amount of bandwidth. The 

2 present invention is directed to making multiple streams available to a user 

3 without using an undue amount of bandwidth. 
4 

5 Summary of the Invention: 

6 The present invention provides a system for capturing multiple images from 

7 multiple cameras and selectively presenting desired views to a user. Multiple 

8 streams of data are streamed to a user's terminal. One data stream (called a 

9 thumbnail stream) is used to tell the user what image streams are available. In 

10 this stream, each image is transmitted as a low resolution thumbnail. One 

1 1 thumbnail is transmitted for each camera and the thumbnails are presented as 

12 small images on the users screen. The thumbnail stream uses a relatively 

13 small amount of bandwidth. Another data stream (called the focus stream) 

14 contains a series of high resolution images from a selected camera. The 

15 images transmitted in this streams are displayed in a relatively large area on 

16 the viewer's screen. A user can switch the focus stream to contain images 

17 from any particular camera by clicking on the associated thumbnail. In an 

18 alternate embodiment in addition to the thumbnails from individual cameras a 

19 user is also provided with a thumbnail of panoramic image (e. g. a full 360 

20 degree panorama or a portion thereof) which combines into a single image, the 

21 images for multiple cameras. By clicking at a position on the panoramic 

22 thumbnail, the focus stream is switched to an image from viewpoint or view 

23 window located at the point in the panorama where the user clicked. In other 

24 alternate embodiments a variety of other data streams are also sent to the 

25 user. The other data streams sent to the user can contain (a) audio data, (b) 

26 interactivity markup data which describes regions of the image which provide 

27 interactivity opportunities such as hotspots, (c) presentation markup data which 

28 defines how data is presented on the user's screen, (d) a telemetry data 

29 stream which can be used for various statistical data. In still another 

30 embodiment one data stream contains a low quality base image for each data 

31 stream. The base images serve as the thumbnail images. A second data 
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1 stream contains data that is added to a particular base stream to increase the 

2 quality of this particular stream and to create the focus stream. 
3 

4 Brief Description of Drawings: 

5 Figure 1 is an overall high level diagram of a first embodiment of the invention. 

6 Figure 2 illustrates the view on a user's display screen. 

7 Figure 3 is a block diagram of a first embodiment of the invention. 

8 Figure 3A illustrates how the thumbnail data stream is constructed. 

9 Figure 4A illustrates how the user interacts with the system. 

10 Figures 4B to 4F show in more detail elements shown in Figure 4A. 

1 1 Figure 5 illustrates how clips are selected. 

12 Figure 6 is an overview of the production process. 

13 Figure 7 is a system overview diagram. 

14 Figure 8 illustrates the clip production process 

15 Figure 9 illustrates the display on a user's display with an alternate 

16 embodiment of the invention. 

17 Figure 10 illustrates an embodiment of the invention which includes additional 

18 data streams. 

19 Figures 1 1 and 1 1 A illustrate an embodiment of the invention where the 

20 thumbnail images are transmitted and displayed with the focus view. 

21 Figure 12 illustrates the interaction between the client and the server over time. 
22 

23 Detailed Description: 

24 An overall diagram of a first relatively simplified embodiment of the invention is 

25 shown in Figure 1 . In the first embodiment of the invention, an event 1 00 is 

26 viewed and recorded by the four cameras 102A to 102D. The event 100 may 

27 for example be a baseball game. The images from cameras 102A to 102D is 

28 captured and edited by system 110. System 110 creates two streams of video 

29 data. One stream is the images captured by "one" selected camera. The 

30 second stream consists of "thumbnails" (i.e. small low resolution images) of the 

31 images captured by each of the four cameras 1 02A to 1 02D. 
32 



WO 01/89221 



PCT7US01/16289 



1 The two video streams are sent to a user terminal and display 111. The 

2 images visible to the user are illustrated in Figure 2. A major portion of the 

3 display is taken by the images from one particular camera. This is termed the 

4 focus stream. On the side of the display are four thumbnail images, one of 

5 which is associated with each of the camera 102A to 102D. It is noted that the 

6 focus stream requires a substantial amount of bandwidth. The four thumbnail 

7 images have a lower resolution and all four thumbnail images can be 

8 transmitted as a single data stream. Examples of the bandwidth used by 

9 various data streams are given below. 
10 

1 1 Figure 3 illustrates a the components in a system used to practice the invention 

12 and it shows how the user interacts with the system. Camera system 300 

13 (which includes camera 102A to 102B) provides images to unit 301 which edits 

14 the image streams and which creates the thumbnail image stream. The 

15 amount of editing depends on the application and it will be discussed in detail 

16 later. Figure 3A illustrates how the thumbnail data stream is created. The data 

17 stream from each camera and the thumbnail data stream are provided to 

18 stream control 302. 

19 The user 306 can see a display 304. An example of what appears on display 

20 304 is shown in Figure 2. The user has an input device (for example a mouse) 

21 and when the user "clicks on" anyone of the thumbnails, viewer software 303 

22 sends a message to control system 302. Thereafter images from the camera 

23 associated with the thumbnail which was clicked are transmitted as the focus 

24 stream. 
25 

26 Figure 3A is a block diagram of the program that creates the thumbnail data 

27 stream. First as indicated by block 331 , a low resolution version of each data 

28 stream is created. Low resolution images can, for example, be created by 

29 selecting and using only every fourth pixel in each image. Creating the low 

30 resolution image in effect shrinks the size of the images. As indicated by block 

31 332, if desired the frame rate can be reduced by eliminating frames in order to 

32 further reduce the bandwidth required. The exact amount that the resolution is 



WO 01/89221 PCTAJS01/16289 



1 reduced depends on the particular application and on the amount of bandwidth 

2 available. In general a reduction in total pixel count of at least five to one is 

3 possible and sufficient. Finally, as indicated by block 333 The corresponding 

4 thumbnail images from each data stream are placed next to each other to form 

5 composite images . The stream of these composite images is the thumbnail 

6 data stream. It should be noted that while in the data stream the thumbnails 

7 are next each other, when they are displayed on the client machine, they can 

8 be displayed in any desired location on the display screen. 
9 

1 0 The details of a first embodiment of the invention are given in Figures 4A to 4F. 

11 In this first embodiment of the invention, system 110 includes a server 401 

12 which streams video to a web client 402 as indicated in Figure 4A. The server 

13 401 takes the four input streams A to D from the four camera 102A to 102 D 

14 and makes two streams T and F. Stream T is a thumbnail stream, that is, a 

15 single stream of images wherein each image in the stream has a thumbnail 

1 6 image from each of the cameras. Stream F is the focus stream of images 

17 which transmits the high resolution images which appear on the user's display. 

18 As shown in Figure 2, the users display shows the four thumbnail images and a 

1 9 single focus stream. 
20 

21 The web client 402 includes a stream selection control 403. This may for 

22 example be a conventional mouse. When the user, clicks on one of the 

23 thumbnails, a signal is sent to the server 401 and the focus stream F is 

24 changed to the stream of images that coincides with the thumbnail that was 

25 clicked. In this embodiment server 401 corresponds to stream control 302 

26 shown in Figure 3 and client 402 includes components 303, 304 and 305 

27 shown in Figure 3. The details of the programs in server 401 and client 402 

28 are shown in Figures 4B to 4E and are described later. 
29 

30 An optional procedure that can be employed to give a user the illusion that the 

31 change from one stream to another stream occurs instantaneously is illustrated 

32 in Figure 4F. Figure 4F shows a sequence of steps that can take place when 
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1 the user decides to change the focus stream to a different camera. It is noted 

2 that under normal operation, a system receiving streaming video buffers the 

3 data at the input of the client system to insure continuity in the event of a small 

4 delay in receiving input . This is a very common practice and it is indicated by 

5 block 461 . When a command is given to change the focus stream, if the 

6 procedure shown in Figure 4F is not used, there will be a delay in that when 

7 the client begins receiving the new stream, it will not be displayed until the 

8 buffer is sufficiently filled. This delay can be eliminated using the technique 

9 illustrated in Figure 4F. With this technique when a viewer issues a command 

10 to change the focus stream the large image on the viewer's screen is 

11 immediately changed to an enlarged image from the thumbnail of the camera 

12 stream newly requested by the user. This is indicated by block 463. That is, 

13 the low resolution thumbnail from the desired camera is enlarged and used as 

14 the focus image. This insures that the focus image changes as soon as the 

15 user indicates that a change is desired. The buffer from the focus data stream 

16 is flushed and it begins filling with the images from the new focus stream as 

17 indicated by blocks 464 and 465. As indicated by block 466, when the buffer is 

18 sufficiently full of images from the new stream, the focus image is changed to a 

19 high resolution image from this buffer. 
20 

21 As indicated by block 301, the data streams from the cameras are edited 

22 before they are sent to users. It is during this editing step that the thumbnail 

23 images are created as indicated in Figure 3A. The data streams are also 

24 compressed during this editing step. Various known types of compression can 

25 be used. 
26 

27 Figure 5 illustrates another type of editing step that may be performed. The 

28 entire stream of images from all the cameras need not be streamed to the 

29 viewer. As illustrated in Figure 5, sections of the streams, called "clips" can be 

30 selected and it is these clips that are sent to a user. As illustrated in figure 5, 

31 two clips C1 and C2 are made from the video streams A to D. In general the 

32 clips would be compressed and stored on a disk file and called up when there 
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1 is a request to stream them to a user. For example, a brief description of clips 

2 showing the key plays from a sporting event can be posted on a web server, 

3 and a user can then select which clips are of interest. A selected clip would 

4 then be streamed to the user. That is, the thumbnail images and a single focus 

5 stream would be sent to a user. The streaming would begin with a default 

6 camera view as the focus view. When desired, the user can switch the focus 

7 stream to any desired camera by clicking on the appropriate thumbnail. With 

8 the first embodiment of the invention, files such as clips are stored on the 

9 server in a file with a ".pan" file type. The pan file would have the data stream 

10 from each camera and the thumbnail data stream for a particular period of 

11 time. 
12 

13 The first embodiment of the invention is made to operate with the commercially 

14 available streaming video technology marketed by RealNetworks Inc. located in 

15 Seattle, Washington. RealNetworks Inc. markets a line of products related to 

16 streaming video including products that can be used to produce streaming 

17 video content, products for servers to stream video over the Internet and video 

1 8 players that users can use to receive and watch streamed video which is 

19 streamed over the Internet. Figures 4B and 4F show the units 401 and 402 in 

20 more detail. 
21 

22 As indicated in Figure 4B, the web server 401 is a conventional server platform 

23 such as an Intel processor with an MS Windows NT operating system and an 

24 appropriate communications port. The system includes a conventional web 

25 server program 412. The web server program 412 can for example be the 

26 program marketed by the Microsoft Corporation as the "Microsoft Internet 

27 Information Server". A video streaming program 413 provides the facility for 

28 streaming video images. The video streaming program 413 can for example 

29 be the "RealSystem Server 8" program marketed by Real networks Inc. 

30 Programs 412 and 413 are commercially available programs. While the 

31 programs 412 and 413 are shown resident on a single server platform, these 

32 two programs could be on different server platforms. Other programs from 
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1 other companies can be substituted for the specific examples given. For 

2 example the Microsoft corporation markets a streaming server termed the 

3 "Microsoft Streaming Server" and the Apple Corporation markets streaming 

4 severs called QuickTime and Darwin. 
5 

6 In the specific embodiment shown "video clips" are stored on a disk storage 

7 sub-system 41 1 . Each video clip has a file type ".pan" and it contains the video 

8 streams from each of the four cameras and the thumbnail stream. When 

9 system receives a URL calling for one of these clips, the fact that the clip has a 
10 file type ".pan" indicates that the file should be processed by plug in 414. 

11 

12 One of the streams stored in a pan file is a default stream and this stream is 

13 sent as the focus stream until the user indicates that another stream should be 

14 the focus stream. Plug in 414 process requests from the user and provides the 

15 appropriate T and F streams to streaming server 413 which sends the streams 

16 to the user. The components of the plug 414 are explained later with reference 

17 to figure 4 D. 
18 

19 As illustrated in Figure 4C, client 402 is a conventional personal computer with 

20 a number of programs. The client 402 includes a Microsoft Windows operating 

21 system 422, and a browser program 423. The browser 423 can for example be 

22 the Microsoft Internet Explorer browser. Streaming video is handled by a 

23 commercially available program marketed under the name: "RealPlayer 8 Plus" 

24 by RealNetworks Inc. Programs 422, 423 and 424 are conventional 

25 commercially available programs. Other similar programs can also be used. 

26 For example Microsoft and Apple provide players for streaming video. A plug 

27 in 425 for the Real Player 424 renders images from pan files, that is, plug in 

28 425 handles the thumbnail and focus data streams and handles the interaction 

29 between the client 402 and the plug in 414 in the server 401 . The components 

30 in plug in 425 are given in Figure 4E. 
31 
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1 Figures 4D and 4E are block diagrams of the programming plug in 414 and 

2 425. Plug in 414 is shown in Figure 4D. When the server encounter a request 

3 to stream a file with the file type ".pan", it retrieves this file from disk storage 

4 subsystem 41 1 (unless the file is made available to the server via some other 

5 input). The file is then transferred to plug in 414. This is indicated by block 

6 432. Commands from the user i.e. "clicks" on a thumbnail, or other types of 

7 input from the user when a pan file is being streamed are also sent to this plug 

8 in 414. As indicated by block 435, plug in 435 selects the thumbnail stream 

9 and either a default or a requested stream from the pan file. As indicated by 

10 block 437, the thumbnail stream and the selected focus stream are sent to the 

1 1 "Real System Server 8" program. In alternate embodiments, other streams are 

12 also available in pan files. These other streams are selected and sent to the 

13 "Real System Server 8" program as appropriate in the particular embodiment. 
14 

15 Figure 4E is a block diagram of the programming components in the plug in 

16 425 on the client machine. When the Real Player 8 Plus 424 encounters data 

17 from a pan files, the data is sent to plug in 425. Figure 4E shows this data as 

1 8 block 451 . The stream manager recognizes the different types of data streams 

19 and sends the data to an appropriate handler 454A to 454C. Data may be 

20 temporarily stored in a cache and hence, as appropriate the data handler 

21 retrieves data from the cache. Each handler is specialized and can handle a 

22 specific type of stream. For example one handler handles the thumbnail 

23 stream and another handler handles the focus stream. The thumbnail handler 

24 divides the composite images in the thumbnail stream into individual images. 

25 The handlers use a set of decoding, decompression and parsing programs 

26 455A to 455B as appropriate. The system may include more handlers than 

27 shown in the figure if there are more kinds of data streams. Likewise the 

28 system may include as many decoder, decompression and parsing programs 

29 as required for the different types of streams in a particular embodiment . The 

30 brackets between the handlers and the decoders in Figure 4E indicate that any 

31 handler can use any appropriate decoder and parser to process image data as 

32 appropriate. The decompressed and parsed data is sent to a rendering 
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1 program 456 which sends the data to the real play input port to be displayed. 

2 A controller 443 controls gating and timing of the various operations. 
3 

4 It should be clearly noted the specific examples given in Figures 4A to 4E are 

5 merely examples of a first simplified embodiment of the invention. For 

6 example instead of working with a web server, the invention could work with 

7 other types of servers such as an intranet server or a streaming media server 

8 or in fact the entire system could be on a single computer with the source 

9 material being stored on the computer's hard disk. The interaction between 

10 the sever 401 and the client 402, and the manner the server responds to the 

1 1 client 402 is explained in detail later with reference to Figure 1 2. It should be 

12 noted that all of the components shown in Figures 4A to 4E (other than the 

13 server platform and personal computer) are software components. 
14 

15 Figure 6 illustrates the system in a typical setup at a sporting event. The 

16 cameras and the sporting event are in stadium 601 . The output from the 

17 camera goes to a video production truck 602 which is typical owned by a TV 

18 network. Such trucks have patch panels at which the output from the cameras 

19 can be made available to equipment in a clips production truck 603, The clip 

20 production truck 603 generates the clips and sends them to a web site 604. 
21 

22 Figure 7 is a system overview of this alternate embodiment. The "feed" from 

23 stadium cameras 701 goes to patch panel 702 and then to a capture station 

24 703. At station 703 operator 1 makes the clip selections as illustrated in Figure 

25 5. He does this by watching one of the channels and when he sees interesting 

26 action he begins capturing the images from each of the camera. The images 

27 are recorded digitally. The images can be digitally recorded with commercially 

28 available equipment. Cutting clips from the recorded images can also be done 

29 with commercially available equipment such as the "Profile™" and "Kalypso™" 

30 Video Production family of equipment marketed by Grass Valley Group Inc. 

31 whose headquarters are in Nevada City, California. . 
32 
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1 As shown in Figure 8 when a clip is selected as indicated at 801 , the clip is 

2 stored and it is given a name as indicated on display 703. The stored clips are 

3 available to the operator of the edit station 704. At the edit station, the clip can 

4 be edited, hot spots can be added and voice can be added. Hot spots are an 

5 overlay provided on the images such that if the user clicks at a particular 

6 position on an image as it is being viewed, some action will be taken. Use of 

7 hot spots is a known technology. When the editing is complete the clips are 

8 compressed and posted on web site 705. 
9 

10 Figure 9 illustrates what a user sees with another alternate embodiment of the 

1 1 invention. The alternative embodiment illustrated in Figure 9 is designed for 

12 use with multiple cameras which record images which can be seamed into a 

13 panorama. Cameras which record multiple images which can be seamed into 

14 a panorama are well known. For example see co-pending application serial 

15 number 09/338,790, filed 6/23/99 and entitled "A System for Digitally 

16 Capturing and Recording Panoramic Movies". 
17 

18 The embodiment shown in Figure 9 is for use with a system that captures six 

19 images such as the camera shown in the referenced co-pending application 

20 (which is hereby incorporated herein by reference). The six images captured 

21 by the camera are: a top, a bottom, a left side, a right side, a front and a back 

22 images (i.e. there is a lens on each side of a cube). These images can be 

23 seamed into a panorama in accordance with the prior art and stored in a format 

24 such as an equi-rectangular or cubic format. With this alternative embodiment, 

25 the user sees a display such as that illustrated in Figure 9. At the top center of 

26 the display is a thumbnail 901 of a panorama. The panoramic image is formed 

27 by seaming s together into one panoramic image, the individual images from 

28 the six cameras. Six thumbnails of images from the cameras (the top, bottom, 

29 left side, right side, front and back of the cube) are shown along the right and 

30 left edges of the display. If a user clicks on any one of the six thumbnails, on 

31 the right and left of the screen, the focus stream switched to that image stream 

32 as in the first embodiment, it is noted that with a panoramic image, it is usual 
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1 for a viewer to select a view window and then see the particular part of the 

2 panorama which is in the selected view window. If the user clicks anywhere in 

3 the panorama 901 , the focus stream is changed to a view window into the 

4 panorama which is centered at the point where the user clicked. With this 

5 embodiment, stream control has as one input a panoramic image and the 

6 stream control selects a view window from the panorama which is dependent 

7 upon where the user clicks on the thumbnail of the panorama. The image from 

8 this view window is then streamed to the user as the focus image. 
9 

10 In other alternative embodiments which show a thumbnail of a panorama, as 

1 1 described above, in addition to (or in place of) the thumbnails of the individual 

12 camera views from the camera which were used to record the panorama, 

13 thumbnails from other camera are provided. These additional cameras may be 

14 cameras which are also viewing the same event, but from a different vantage 

15 point. Alternatively they can be from some related event. 
16 

17 A somewhat more complicated alternate embodiment of the invention is shown 

18 in Figure 10. In the embodiment illustrated in Figure 10, a server 910 receives 

19 eight streams S1 to S8. The eight streams include four streams S5 to S8 that 

20 are similar to the video streams described with reference to the previously 

21 described embodiment. These four streams include a stream S8 where each 

22 image contains a thumbnail of the other images and three video streams 

23 designated V1 to V3. 
24 

25 The server selects the streams that are to be streamed to the user as 

26 described with the first embodiment of the invention. The selected streams are 

27 then sent over a network (for example over the Internet) to the client system. 
28 

29 The additional data streams provided by this embodiment of the invention 

30 include an audio stream S4, an interactivity markup stream S3, a presentation 

31 markup stream S2 and a telemetry data stream S1 . The audio stream S4 

32 provides audio to accompany the video stream. Typically there would be an 
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1 single audio stream which would be played when any of the video streams are 

2 viewed. For example, there may be a play by play description of a sporting 

3 event which would be applicable irrespective of which camera is providing the 

4 focus stream. However, there could be an audio stream peculiar to each video 

5 stream. 
6 

7 The interactivity markup stream S3 describes regions of the presentation which 

8 provide for additional user interaction. For example there may be a button and 

9 clicking on this button might cause something to happen. The interactivity 

10 markup stream consists of a series of encoded commands which give type and 

1 1 position information. The commands can be in a descriptive language such as 

12 XML encoded commands or commands encoded in some other language. 

13 Such command languages are known and the ability to interpret commands 

14 such as XML encoded commands is known. 
15 

16 The presentation markup stream provides an arbitrary collection of time 

17 synchronized images and data. For example, the presentation markup stream 

18 can provide a background image for the display and provide commands to 

19 change this background at particular times. The presentation mark up stream 

20 may provide data that is static or dynamic. The commands can, for example, 

21 be in the form of XLM encoded commands. 
22 

23 The telemetry data stream S1 can provide any type of statistical data. For 

24 example this stream can provide stock quotes or player statistics during a 

25 sporting event. Alternatively the stream could provide GPS codes indicating 

26 camera position or it could be video time codes. 
27 

28 Yet another alternate embodiment of the invention is shown in Figure 1 1 . With 

29 the embodiment shown in Figure 1 1 , there is not a separate video stream for 

30 the thumbnail images. In this embodiment, instead of having a separate 

31 stream for the thumbnail, the thumbnails are transmitted as part of the video 

32 streams V1 , V2 and V3. A set of the thumbnails is included in each of the 
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1 video streams. Hence, irrespective of which video stream is selected for the 

2 focus steam, the user will have available thumbnails of the other streams. 

3 Figure 1 1 A illustrates the display showing an image from the focus stream with 

4 the thumbnails on the bottom as part of this image. 
5 

6 A key consideration relative to video streaming is the bandwidth required. If 

7 unlimited bandwidth were available, ail the data streams would be sent to the 

8 client. The present invention provides a mechanism whereby a large amount 

9 of data, for example data from a plurality of camera, can be presented to a 

10 user over a limited bandwidth in a manner such that the user can take 

1 1 advantage of the data in all the data streams. The specific embodiments 

12 shown relate to data from multiple camera that are viewing a particular event. 

13 However, the multiple streams need not be from cameras. The invention can 

14 be used in any situation where there are multiple streams of data which a user 

15 is interested in monitoring via thumbnail images. With the invention, the user 

16 can monitor the multiple streams via the thumbnail images and then make any 

17 particular stream the focus stream which becomes visible in an high quality 

18 image. Depending upon the amount of bandwidth available there could be a 

19 large nunhber of thumbnails and there may be more than one focus stream that 

20 is sent and shown with a higher quality image. 
21 

22 The flowing table shows the bandwidth requirements of various configurations. 
23 

Main Video Size 320x240 



Number Video Streams 


2 


2 


3 


3 


4 


4 


Video Stream Vertical 


240 


240 


240 


240 


240 


240 


Video Stream Horizontal 


320 


320 


320 


320 


320 


320 


Thumbnail Vertical 


100 


100 


100 


100 


100 


100 


Thumbnail Horizontal 


75 


75 


75 


75 


75 


75 


Video frame rate 


7 


15 


7 


15 


7 


15 


Color Depth (bits) 


24 


24 


24 


24 


24 


24 


MPEG4 Video Compression ratio 


150 


150 


150 


150 


150 


150 


Presentation Video Bandwidth 


188832 404640 


283248 


606960 


377664 


809280 


Shaped Video Bandwidth 


102816 220320 


111216 


238320 


119616 


256320 


Number Audio Streams 


1 


1 


1 


1 


1 


1 
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Audio bitrate 30000 30000 30000 30000 30000 30000 

Presentation Audio Bandwidth 30000 30000 30000 30000 30000 30000 



Number Telemetry Streams 11 111 

Telemetry bit rate 500 500 500 500 500 

Presentation Telemetry Bandwidth 500 500 500 500 500 

Number Presentation Markup Stream 11111 

Markup bitrate 2500 2500 2500 2500 2500 

Presentation Markup Bandwidth 2500 2500 2500 2500 2500 

Number Interactivity Markup Stream 11 111 

Markup bitrate 1000 1000 1000 1000 1000 

Presentation Markup Bandwidth 1000 1000 1000 1000 1000 

Presentation Bandwidth (bps) 
Presentation Bandwidth (Kbs) 
Presentation Bandwidth (KBs) 

Shaped Bandwidth 136816 254320 

Shaped Streaming (Kbs) 133.61 248.36 

Shaped Streaming (KBs) 16.70 31.04 



222832 438640 317248 640960 411664 
217.61 428.36 309.81 625.94 402.02 
27.20 53.54 38.73 78.24 50.25 



145216 272320 153616 
141.81 265.94 150.02 
17.73 33.24 18.75 



1 

500 
500 

1 

2500 
2500 

1 

1000 
1000 

843280 
823.52 
102.94 

290320 
283.52 
35.44 



1 

2 The interaction between the server and the client is illustrated in Figure 1 2. 

3 Figure 12 illustrates the three components of the system. The components 

4 are: 

5 The client: The client is operated by a user. It displays the presentation 

6 content received from the server. It instructs the server to change Focus 

7 streams, play forward, backwards, fast forward, fast reverse, replay pause and 

8 stop. 

9 The server: The server responds to client requests. The presentation source: 

10 The presentation source could be disk storage, a remote server, or a feed from 

1 1 a computer that is generating a presentation from live inputs. 
12 

13 As illustrated in Figure 12, the process begins when the client requests a 

14 presentation as indicated by arrow 991. This creates a server session and the 

15 server begins accessing the presentation from the presentation source and 

16 providing it to the server as indicated by arrow 992. The server then being 

17 streaming this information to the client. At this point the focus stream is a 

18 default stream. The client's screen is configured according to the layout 
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1 information given in the presentation mark up stream. For example this could 

2 be XML encoded description commands in the presentation markup stream. In 

3 the example given, at this point the client requests that the focus stream 

4 change. This is sent to the server as indicated by arrow 994. 
5 

6 When the server receives the command, it stops streaming the old focus 

7 stream and starts streaming the new focus stream as indicated by arrow 995. 

8 A new layout for the user's display is also sent as indicated by arrow 996. It is 

9 noted that a wide variety of circumstances could cause the server to send to 

1 0 the client a new layout for the users display screen. When the client receives 

1 1 the new display layout, the display is reconfigured. 
12 

1 3 Arrow 997 indicates that the user can request an end to the streaming 

14 operation. Upon receipt of such a request or when the presentation (e.g. the 

15 clip) ends, the server stops the streaming operation and ends access to the 

16 presentation source as indicated by arrows 998. The server also ends the 

17 connection to the client as indicated by arrow 999 and the server session ends. 

18 It should be understood that the above example is merely illustrative and a 

1 9 wide variety of different sequences can occur. 
20 

21 Another embodiment of the invention operates by sending base information to 

22 create the thumbnail images and additional information to create the focus 

23 image. The user sees the same display with this embodiment as the user sees 

24 with the previously described embodiments; however, this embodiment uses 

25 less bandwidth. With this embodiment, the focus data stream is not a stream 

26 of complete images. Instead, the focus stream is merely additional 

27 information, that can be added to the information in one of the thumbnails 

28 images to create a high resolution image. The thumbnail images provide basic 

29 information which creates a low resolution thumbnail. The focus stream 

30 provides additional information which can be added to the information in a 

31 thumbnail to create a high resolution large image. 
32 
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1 The following table illustrates the bandwidth savings: 
Main Video Size 320x240 



Previously 
embodiment 



Number of Input Video Streams 
Number Base Layer Streams 
Number Enhancement Layer Streams 
Video Stream Vertical 
Video Stream Horizontal 
Thumbnail Vertical 
Thumbnail Horizontal 
Video frame rate 
Color Depth (bits) 
MPEG4 Video Compression ratio 
Presentation Video Bandwidth 
Shaped Video Bandwidth 

Number Audio Streams 
Audio bitrate 

Presentation Audio Bandwidth 

Number Telemetry Streams 
Telemetry bit rate 

Presentation Telemetry Bandwidth 

Number Presentation Markup Stream 
Markup bitrate 

Presentation Markup Bandwidth 

Number Interactivity Markup Stream 
Markup bitrate 

Presentation Markup Bandwidth 

Presentation Bandwidth (bps) 
Presentation Bandwidth (Kbs) 
Presentation Bandwidth (KBs) 

Shaped Bandwidth 
Shaped Streaming (Kbs) 
Shaped Streaming (KBs) 



Using Base and 
Enhancement Layers 

3 3 

0 3 

0 3 
240 240 
320 320 

75 75 

100 100 

15 15 

24 24 

150 150 

606960 552960 

238320 184320 

1 1 
30000 30000 
30000 30000 

1 1 

500 500 

500 500 

1 1 

2500 2500 

2500 2500 

1 1 

1000 1000 

1000 1000 

640960 586960 

625.94 573.20 

78.24 71.65 

272320 218320 

265.94 213.20 

33.24 26.65 



3 Subdividing the image data can further reduce bandwidth by allowing optimized 

4 compression techniques to be used on each subdivision. Subdivisions may be 

5 made by any desirable feature of the imagery, such as pixel regions, 
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1 foreground/background, frame rate, color depth, resolution, detail type, etc., or 

2 any combination of these. Each data stream can be compressed using a 

3 technique that preserves the highest quality for a given bandwidth given its 

4 data characteristics. The result is a collection of optimally compressed data 

5 streams, each containing a component of the resultant images. With this 

6 embodiment, each thumbnail image stream is constructed on the client by 

7 combining several of these data streams, and its corresponding focus image 

8 stream is constructed on the client by combining the thumbnail streams (or 

9 thumbnail images themselves) and more data streams. 
10 

1 1 For example, consider a multiple view video that consists of different views of 

12 live action characters superimposed against the same static background 

13 image. The client sees a low-resolution thumbnail stream for each view and a 

14 high-resolution focus stream of one of them. These view streams could be 

15 compressed as described before, with a low-resolution thumbnail stream and 

16 additional data streams for turning them into high-resolution focus streams. 

17 However, additional bandwidth savings can be realized if two features of the 

18 images streams are utilized: a) the frame rate of the background image is 

19 different than the foreground, specifically, the background image is static 

20 throughout the entire presentation, so only one image of it ever needs to be 

21 sent regardless of how many image frames the presentation is, and b) the 

22 same background image is used for all the view streams, so only one copy of 

23 the background image needs to be sent and can be reused by all the view 

24 streams. In order to realize this bandwidth savings, a foreground/background 

25 subdivision may be made to the video data in the following way: 

26 a) A data stream containing a single low-resolution background image that 

27 is reused to generate all the thumbnail images 

28 b) Data streams containing low-resolution foreground images for the 

29 thumbnail views, one stream per view. 

30 c) A data stream containing additional data to boost the low-resolution 

31 background image to become the high-resolution background image. 
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1 d) Data streams containing additional data for boosting the low-resolution 

2 foreground images to become high-resolution foreground images. 
3 

4 In this embodiment, each image in the thumbnail stream is generated on the 

5 client by combining the low-resolution background image with the appropriate 

6 low-resolution foreground image. Each image in the focus stream is generated 

7 on the client by: adding the additional background image data to the low- 

8 resolution background image to generate the high-resolution foreground image, 

9 adding the additional foreground image data to the low-resolution foreground 

10 image to generate the high-resolution foreground image, and then combining 

11 the high-resolution foreground and background images to generate the final 

12 focus-stream image. 
13 

14 As another example, consider a video where each stream contains a view of a 

15 subject against a blurry background, such as one might see at a sporting event 

16 where a cameraman has purposely selected camera settings that allow the 

17 player to be in crisp focus while the crowd behind the player is significantly 

18 blurred. The client sees a low-resolution thumbnail stream for each view and a 

19 high-resolution focus stream of one of them. These views could be 

20 compressed with a quality setting chosen to preserve the detail in the player. 

21 However, bandwidth savings could be realized by utilizing the fact that the 

22 blurry crowd behind the player is unimportant to the viewer and can therefore 

23 be of lower quality. In order to realize this bandwidth savings, a pixel region 

24 subdivision can be made to the image data in the following way: 

25 a) A data stream containing the player region in low resolution, for the 

26 thumbnail images. 

27 b) A data stream containing the remaining image region in low-resolution, 

28 for the thumbnail images. This image region would be compressed with a 

29 lower quality than that used for the player region. 

30 c) An additional data stream, one per focus view, for boosting the low- 

31 resolution player region into a high-resolution player region. 
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1 d) An additional data stream, on per focus view, for boosting the remaining 

2 image region from low-resolution to high-resolution. This image region 

3 would be compressed with a lower quality than that used for the player 

4 region. 
5 

6 Each Image in the thumbnail stream is generated on the client by combining 

7 the player region with the rest of that image. Each image in the focus stream is 

8 generated on the client by: adding the additional player region data to the low- 

9 resolution player image to generate the high-resolution player image, adding 

10 the additional remaining image data to the low-resolution remaining image 

1 1 region generate the high-resolution remaining image region, and then 

12 combining the two regions to generate the final focus-stream image. 
13 

14 As another example, consider a video where each stream contains fast-moving 

15 objects that are superimposed on slowly changing backgrounds. The client 

16 sees a low-resolution thumbnail stream for each view and a high-resolution 

17 focus stream of one of them. Each stream of video could use a frame rate that 

18 allows the fast-moving object to be displayed smoothly. However, bandwidth 

19 savings could be realized by utilizing the fact that the slowly changing 

20 background differs little from one frame to the next, while the fast-moving 

21 object differs significantly from one frame to the next. In order to realize this 

22 bandwidth savings, a pixel region subdivision must be made to the image data 

23 in the following way: 

24 a) A data stream containing the fast-moving object regions in low resolution, 

25 for the thumbnail images. This stream uses a fast frame rate. 

26 b) A data stream containing the remaining image region in low-resolution, 

27 for the thumbnail images. This stream uses a slower frame rate than what 

28 was used for the fast-moving object region. 

29 c) An additional data stream, one per focus view, for boosting the low- 

30 resolution fast-moving object region into a high-resolution fast-moving 

31 object region. This stream uses a fast frame rate. 
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1 d) An additional data stream, on per focus view, for boosting the remaining 

2 image region from low-resolution to high-resolution. This stream uses a 

3 slower frame rate than what was used for the fast-moving object region. 
4 

5 In this embodiment, each image in the thumbnail stream is generated on the 

6 client by combining the fast-moving object region with the most-recent frame of 

7 the rest of that image. Each image in the focus stream is generated on the 

8 client by: adding the additional fast-moving object region data to the low- 

9 resolution fast-moving object image to generate the high-resolution fast-moving 

10 object image, adding the additional remaining image data to the low-resolution 

1 1 remaining image region to generate the high-resolution remaining image 

12 region, and then combining the high-resolution fast-moving object regions with 

13 the most recent frame of the remaining image region to generate the final 

14 focus-stream image. 
15 

16 As another example, consider a video where each stream contains well-lit 

17 subjects in front of a differently lit background that results in a background that 

18 is shades of orange. The client sees a low-resolution thumbnail stream for 

19 each view and a high-resolution focus stream of one of them. Each stream of 

20 video could use the whole images as is. However, bandwidth savings could be 

21 realized by utilizing the fact that the background uses a restricted palette of 

22 orange and black hues. In order to realize this bandwidth savings, a pixel 

23 region subdivision must be made to the image data in the following way: 

24 a) A data stream containing the image region that the well-lit subject 

25 occupies, for the thumbnail images. Full color data is retained for these 

26 images. 

27 b) A data stream containing the remaining image region in low-resolution, 

28 for the thumbnail images. For these images, the full color data is discarded 

29 and only the brightness value part of the color data is retained, allowing 

30 fewer bits of data to be used for these images. Upon decompression, these 

31 brightness values will be used to select the appropriate brightness of 

32 orange coloration for that part of the image. 
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1 c) An additional data stream, one per focus view, for boosting the low- 

2 resolution image of the well-lit subject into a high-resolution image of the 

3 well-lit subject. Full color data is retained for this additional data. 

4 d) An additional data stream, on per focus view, for boosting the remaining 

5 image region from low-resolution to high-resolution. For this additional 

6 data, the full color data is discarded and only the brightness value part of 

7 the color data is retained, allowing fewer bits of data to be used. Upon 

8 decompression, these brightness values will be used to select the 

9 appropriate brightness of orange coloration for that part of the image. 
10 

1 1 In this embodiment, each image in the thumbnail stream is generated on the 

12 client by combining the well-lit subject object region with the remaining image 

13 region in which the brightness values in the image were used to select the 

14 correct brightness of orange color for those parts of the image. Each image in 

15 the focus stream is generated on the client by: adding the additional well-lit 

16 subject region data to the low-resolution well-lit subject image to generate the 

17 high-resolution well-lit subject image, adding the additional remaining image 

18 data to the low-resolution remaining image region to generate the high- 

19 resolution remaining image region and using the brightness values in the 

20 image to select the correct brightness of orange color for those parts of the 

21 image, and then combining the high-resolution well-lit subject regions with the 

22 remaining image region generated earlier. 
23 

24 While the invention has been shown and described with respect to a plurality of 

25 preferred embodiments, it will be appreciated by those skilled in the art, that 

26 various changes in form and detail may be made without departing from the 

27 spirit and scope of the invention. The scope of applicant's invention is limed 

28 only by the appended claims. 
29 
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1 I claim: 

2 1) A system for capturing and displaying images comprising, 

3 a plurality of video cameras viewing an event, 

4 digital storage for storing the outputs from said cameras, 

5 a first edit station with access to said stored video for selecting clips from said 

6 video streams, 

7 a second edit station for editing the output of said first edit station, 

8 a web site for storing the edited clips, 

9 a user browser, 

10 a packaging program for creating two video streams one of which is a focus 

1 1 stream and one of which contains a thumbnail of the images from each 

12 camera, 

13 a control device at said browser whereby a user can signal to said web site as 

14 to which video stream should be the focus stream. 
15 

16 2) A system for displaying to a user a selected one of a plurality of video 

17 streams, said selected video stream being a focus stream, said system 

18 comprising, 

19 a client system which can display said selected video stream, and a composite 

20 video containing a thumbnail image of each of said plurality of video streams, 

21 a server which receives a plurality of video streams, and said composite video 

22 stream, and which provides a selected one of said video streams and said 

23 composite video stream to said client system, and 

24 an input device connected to said client system whereby a user can select one 

25 of said thumbnails thereby sending a signal to said server indicating which of 

26 said plurality of video streams should be sent to said client system. 
27 

28 3) The system recited in claim 2 wherein said server also sends a presentation 

29 markup stream to said client machine to control the presentation of images by 

30 said client machine. 
31 
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1 4) The system recited in claim 2 wherein said server also sends an audio 

2 stream to said client machine. 
3 

4 5) The system recited in claim 2 wherein said server also sends an interactivity 

5 mark up stream to said client system to describe regions of the presentation 

6 that provide additional user interaction with said system. 
7 

8 6) The system recited in claim 2 wherein said server also sends a stream of 

9 telemetry data to said client machine. 
10 

1 1 7) The system recited in claim 2 wherein said server also sends an audio 

12 stream, a presentation mark up stream, and an interactive markup stream to 

1 3 said client machine. 
14 

15 8) A method of streaming selected data from a plurality of cameras to a user 

16 who is viewing a display on a client machine comprising the steps of: 

17 streaming to said client machine a focus stream containing the images from a 

18 particular one of said cameras and a second video stream, each image in 

19 which contains a thumbnail of the images from each of said cameras, and 

20 responding when a user selects a thumbnail of the images from a selected 

21 camera by making said focus'stream the images from said selected camera. 
22 

23 9) The method recited in claim 8 wherein a presentation mark up stream is 

24 sent to said client machine to indicate to control the display of images on said 

25 client machine. 
26 

27 10) The method recited in claim 8 wherein an audio stream is also sent to said 

28 client machine. 
29 

30 
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1 11) The method recited in claim 8 wherein an interactivity mark up stream is 

2 sent to said client system to describe regions of the presentation that provide 

3 additional user interaction with said system. 
4 

5 12) The method recited in claim 8 wherein a stream of telemetry data is also 

6 sent to said client machine 
7 

8 1 3) A system for displaying video images comprising, 

9 a server which has available a plurality of data streams, 

10 a client, 

1 1 first means for streaming a first video stream from said server to said client and 

12 for simultaneously streaming a second video stream from said server to said 

13 client, said second stream consisting of composite images each of which 

14 includes a plurality of low resolution images, 

15 means at said client for receiving said streams and for displaying to a user a 

16 high resolution image and a plurality of thumbnails which indicate other 

17 streams which are available at said server, 

18 means for allowing said user to indicate which of the streams indicated by said 

19 thumbnails, said user would like to form said focus stream. 
20 

21 14) The system recited in claim 13 wherein said server and said client are 

22 located on one physical computer system. 
23 

24 15) The system recited in claim 2 wherein said server and said client are 

25 located on one physical computer system. 
26 

27 16) The system recited in claim 13 wherein said first stream contains base data 

28 which can form low resolution thumbnail images and said second stream 

29 contains enhancement data which can be added to a low resolution image to 

30 form an enhanced image. 
31 



-25- 



WO 01/89221 



PCT/US01/16289 



1 17) The system recited in claim 13 wherein one of said low resolution images is 

2 an panoramic image and wherein said high resolution image can be a view 

3 window from said panorama. 
4 

5 18) The system recited in claim 2 wherein both of said composite images and 

6 said high resolution images are contained in the same data stream. 
7 

8 19) The system recited in claim 13 wherein a background image is steamed to 

9 said client and said high resolution image and said thumbnail images are 
10 superimposed over said background image. 

11 

12 20) The system recited in claim 13 wherein said high resolution images 

13 presented to said user include hot spots which can be used to activate 

14 commands. 
15 
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